52 research outputs found
Block-based execution on an integrated vector-scalar in-order core
In the low-end processor mobile market, power, energy and area budgets are significantly lower than in the server/desktop/lap-top/high-end mobile markets. It has been shown that vector processors are a highly energy-efficient way to increase performance but adding support for them incurs area and power overheads that could not be acceptable for low-end mobile processors. In this work, we propose an integrated vector-scalar design that mostly reuses scalar hardware to support the execution of vector instructions. The key element of the design is our proposed block-based model of execution that groups vector instructions to execute them in a coordinated manner
Rapid Evaluation of Requirements for Vector Micro-Architectures
English: Power consumption has become one of the dominant issues in processor
design, especially important in embedded systems and data
centers. One of possible solution that can address this issue and provide
higher performance for existing applications and new capabilities
for future applications used in hand-held devices and data centers is
to use vector processor.
This thesis presents the design and implementation of a vector library
that enables the vectorization of the target applications and allows to
characterize them.
We also present the ETModel: a simple trace-driven simulator for
vector processors. It is used to analyse the micro-architectural requirements
of the vectorized applications.
We show that the target applications are highly vectorizable with a
degree of vectorization from 62.9% for H264ref to 91% for ECLAT.
Detailed instruction level characteristics such as the distribution of
vector instructions, the distribution of vector lengths, etc. are also
presented in the thesis.
The thesis contains detailed timing analysis of the vectorized applications
for di erent micro-architectural con gurations of a vector processor.
We measured the execution time for the di erent con gurations
of cache hierarchy, main memory latencies, maximum vector
lengths and con guration of functional units, as well as the usage of
functional units. All these help in understanding the behavior of the
vectorized applications and requirements of vector micro-architecture
Vector processing-aware advanced clock-gating techniques for low-power fused multiply-add
The need for power efficiency is driving a rethink of design decisions in processor architectures. While vector processors succeeded in the high-performance market in the past, they need a retailoring for the mobile market that they are entering now. Floating-point (FP) fused multiply-add (FMA), being a functional unit with high power consumption, deserves special attention. Although clock gating is a well-known method to reduce switching power in synchronous designs, there are unexplored opportunities for its application to vector processors, especially when considering active operating mode. In this research, we comprehensively identify, propose, and evaluate the most suitable clock-gating techniques for vector FMA units (VFUs). These techniques ensure power savings without jeopardizing the timing. We evaluate the proposed techniques using both synthetic and “real-world” application-based benchmarking. Using vector masking and vector multilane-aware clock gating, we report power reductions of up to 52%, assuming active VFU operating at the peak performance. Among other findings, we observe that vector instruction-based clock-gating techniques achieve power savings for all vector FP instructions. Finally, when evaluating all techniques together, using “real-world” benchmarking, the power reductions are up to 80%. Additionally, in accordance with processor design trends, we perform this research in a fully parameterizable and automated fashion.The research leading to these results has received funding from the RoMoL ERC Advanced Grant GA 321253 and is supported in part by the European Union (FEDER funds) under contract TTIN2015-65316-P.
The work of I. Ratkovic was supported by a FPU research grant from the Spanish MECD.Peer ReviewedPostprint (author's final draft
Evaluation of vectorization potential of Graph500 on Intel's Xeon Phi
Graph500 is a data intensive application for high performance computing and it is an increasingly important workload because graphs are a core part of most analytic applications. So far there is no work that examines if Graph500 is suitable for vectorization mostly due a lack of vector memory instructions for irregular memory accesses. The Xeon Phi is a massively parallel processor recently released by Intel with new features such as a wide 512-bit vector unit and vector scatter/gather instructions. Thus, the Xeon Phi allows for more efficient parallelization of Graph500 that is combined with vectorization. In this paper we vectorize Graph500 and analyze the impact of vectorization and prefetching on the Xeon Phi. We also show that the combination of parallelization, vectorization and prefetching yields a speedup of 27% over a parallel version with prefetching that does not leverage the vector capabilities of the Xeon Phi.The research leading to these results has received funding from the
European Research Council under the European Unions 7th FP (FP/2007-
2013) / ERC GA n. 321253. It has been partially funded by the Spanish
Government (TIN2012-34557)Peer ReviewedPostprint (published version
Spirulina Phycobiliproteins as Food Components and Complements
Spirulina has a documented history of use as a food for more than 1000Â years, and has been in production as a dietary supplement for 40Â years. Among many of Spirulina bioactive components, blue protein C-phycocyanin and its linear tetrapyrrole chromophore phycocyanobilin occupy a special place due to broad possibilities for application in various areas of food technology. The subject of this chapter is up-to-date food applications of these Spirulina components, with a focus on their use as food colorants, additives, nutriceuticals, and dietary supplements. Their other actual and future food application possibilities will also be briefly presented and discussed
POSTER: An Integrated Vector-Scalar Design on an In-order ARM Core
In the low-end mobile processor market, power, energy and area budgets are significantly lower than in other markets
(e.g. servers or high-end mobile markets). It has been shown that vector processors are a highly energy-efficient way to increase performance; however adding support for them incurs area and power overheads that would not be acceptable for low-end mobile processors. In this work, we propose an integrated vector-scalar design for the ARM architecture that mostly reuses scalar hardware to support the execution of vector instructions. The key element of the design is our proposed block-based model of execution that groups vector computational instructions together to execute them in a coordinated manner.The research leading to these results has received funding from the RoMoL ERC Advanced Grant GA no 321253
and is supported in part by the European Union (FEDER funds) under contract TIN2015-65316-P. This research has
been also supported the Agency for Management of University and Research Grants (AGAUR - FI-DGR 2014).Peer ReviewedPostprint (author's final draft
GenotoksiÄŤni efekat metanolskog ekstrakta biljke Cotinus Coggygria Scop. kod Drosophila Melanogaster
Plant extracts that appear to have favorable properties, may contain chemical
compounds with mutagenic, teratogenic and/or carcinogenic activity, and it is of great
importance to the inclusion of genotoxic approaches to toxicological evaluation of plant
extracts. Using a comet assay on eukaryotic model organism Drosophila melanogaster
in in vivo condition, potential genotoxic activity of the methanol extract of plant Cotinus
coggygria Scop. was determined. Treatment with the methanol extracts, at a
concentration of 1%, caused no significant changes compared to the negative control.
Based on the distribution of comet class and selected quantitative parameters (% DNA
in tail and tail length) it can be concluded that a methanol extract obtained from C.
coggygria at a concentration of 1% does not shows genotoxic activity.Uključivanje genotoksičnog pristupa u toksikološku evaluaciju biljnih
ekstrakata, koji i pored povoljnih svojstava mogu da sadrĹľe komponente sa mutagenim,
teratogenim i/ili kancerogenim aktivnostima, je od velike vaĹľnosti. Primenom Komet testa
kod eukariotskog model organizma Drosophila melanogaster u in vivo uslovima
ispitivana je genotoksiÄŤna aktivnost metanolskog ekstrakta biljke Cotinus coggygria Scop.
Тretman sa ekstraktom u koncentraciji od 1% nije uzrokovao statistički značajne promene
u odnosu na negativnu kontrolu. Na osnovu raspodela komet klasa i odabranih
kvantitativnih parametara moĹľe se zakljuÄŤiti da ekstrakt biljke C. coggygria ne pokazuje
genotoksiÄŤnu aktivnost
An integrated vector-scalar design on an in-order ARM core
In the low-end mobile processor market, power, energy, and area budgets are significantly lower than in the server/desktop/laptop/high-end mobile markets. It has been shown that vector processors are a highly energy-efficient way to increase performance; however, adding support for them incurs area and power overheads that would not be acceptable for low-end mobile processors. In this work, we propose an integrated vector-scalar design for the ARM architecture that mostly reuses scalar hardware to support the execution of vector instructions. The key element of the design is our proposed block-based model of execution that groups vector computational instructions together to execute them in a coordinated manner. We implemented a classic vector unit and compare its results against our integrated design. Our integrated design improves the performance (more than 6Ă—) and energy consumption (up to 5Ă—) of a scalar in-order core with negligible area overhead (only 4.7% when using a vector register with 32 elements). In contrast, the area overhead of the classic vector unit can be significant (around 44%) if a dedicated vector floating-point unit is incorporated. Our block-based vector execution outperforms the classic vector unit for all kernels with floating-point data and also consumes less energy. We also complement the integrated design with three energy/performance-efficient techniques that further reduce power and increase performance. The first proposal covers the design and implementation of chaining logic that is optimized to work with the cache hierarchy through vector memory instructions, the second proposal reduces the number of reads/writes from/to the vector register file, and the third idea optimizes complex memory access patterns with the memory shape instruction and unified indexed vector load.The research leading to these results has received funding from the RoMoL ERC Advanced Grant GA no 321253 and is
supported in part by the European Union (FEDER funds) under contract TIN2015-65316-P. This research has been also supported the Agency for Management of University and Research Grants (AGAUR - FI-DGR 2014). O. Palomar is funded by a Royal Society Newton International Fellowship.Peer ReviewedPostprint (author's final draft
- …